RRR: Rank-Regret Representative
نویسندگان
چکیده
Selecting the best items in a dataset is a common task in data exploration. However, the concept of “best” lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove “dominated” items and create a “representative” subset of the data set, comprising the “best items” in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be almost as big as the full data. Representative can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users’ “regret”. Existing work defines regret as the loss in score by limiting consideration to the representative instead of the full data set, for any chosen ranking function. However, the score is often not a meaningful number and users may not understand its absolute value. Sometimes small ranges in score can include large fractions of the data set. In contrast, users do understand the notion of rank ordering. Therefore, alternatively, we consider the position of the items in the ranked list for defining the regret and propose the rank-regret representative as the minimal subset of the data containing at least one of the top-k of any possible ranking function. This problem is NP-complete. We use the geometric interpretation of items to bound their ranks on ranges of functions and to utilize combinatorial geometry notions for developing effective and efficient approximation algorithms for the problem. Experiments on real datasets demonstrate that we can efficiently find small subsets with small rank-regrets.
منابع مشابه
Robustness in portfolio optimization based on minimax regret approach
Portfolio optimization is one of the most important issues for effective and economic investment. There is plenty of research in the literature addressing this issue. Most of these pieces of research attempt to make the Markowitz’s primary portfolio selection model more realistic or seek to solve the model for obtaining fairly optimum portfolios. An efficient frontier in the ...
متن کاملStability of dietary patterns assessed with reduced rank regression; the Zutphen Elderly Study
BACKGROUND Reduced rank regression (RRR) combines exploratory analysis with a-priori knowledge by including risk factors in the model. Dietary patterns, derived from RRR analysis, can be interpreted by the chosen risk factor profile and give an indication of positive or adverse health effects for a specific disease. Our aim was to assess the stability of dietary patterns derived by RRR over tim...
متن کاملLow-rank Bandits with Latent Mixtures
We study the task of maximizing rewards from recommending items (actions) to users sequentially interacting with a recommender system. Users are modeled as latent mixtures of C many representative user classes, where each class specifies a mean reward profile across actions. Both the user features (mixture distribution over classes) and the item features (mean reward vector per class) are unkno...
متن کاملCointegrating rank selection in models with time-varying variance
Reduced rank regression (RRR) models with time varying heterogeneity are considered. Standard information criteria for selecting cointegrating rank are shown to beweakly consistent in semiparametric RRR models in which the errors have general nonparametric short memory components and shifting volatility provided the penalty coefficient Cn → ∞ and Cn/n → 0 as n → ∞. The AIC criterion is inconsis...
متن کاملMultivariate reduced rank regression in non-Gaussian contexts, using copulas
We propose a new procedure to perform Reduced Rank Regression (RRR) in nonGaussian contexts, based on Multivariate Dispersion Models. Reduced-Rank Multivariate Dispersion Models (RR-MDM) generalise RRR to a very large class of distributions, which include continuous distributions like the normal, Gamma, Inverse Gaussian, and discrete distributions like the Poisson and the binomial. A multivaria...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.10303 شماره
صفحات -
تاریخ انتشار 2018